Abstract
The acquisition of translation equivalents is often considered a special component of bilingual children’s vocabulary development, as bilinguals have to learn words that share the same meaning across their two languages. This study examined three contrasting accounts for bilingual children’s acquisition of translation equivalents relative to words that are first labels for a referent: the Avoidance Account whereby translation equivalents are harder to learn, the Preference Account whereby translation equivalents are easier to learn, and the Neutral Account whereby translation equivalents are similar to learn. To adjudicate between these accounts, Study 1 explored patterns of translation equivalent learning under a novel computational model — the Bilingual Vocabulary Model — which quantifies translation equivalent knowledge as a function of the probability of learning words in each language. Study 2 tested model-derived predictions against vocabulary data from 200 French–English bilingual children aged 18–33 months. Results showed a close match between the model predictions and bilingual children’s patterns of translation equivalent learning. At smaller vocabulary sizes, data matched the Preference Account, while at larger vocabulary sizes they matched the Neutral Account. Our findings show that patterns of translation equivalent learning emerge predictably from the word learning process, and reveal a qualitative shift in translation equivalent learning as bilingual children develop and learn more words.
Bilingual children must learn words that take a different form in each of their languages, but share the same or highly similar meanings. For instance, to refer to the same crisp red-skinned fruit, an English–French bilingual child must use the word “apple” when speaking English, and the word “pomme” when speaking French. These cross-language synonyms are known as translation equivalents (also called doublets; Umbel et al. (1992)), and are observed amongst bilingual children’s first words (e.g., David & Wei, 2008; De Houwer, Bornstein, & De Coster, 2006; Pearson et al., 1995). Translation equivalents are thought to hold a special status in a bilingual’s developing lexicon due to the strong overlap in their semantics. For example, studies with bilingual toddlers show that the associative semantic properties of a word in one language facilitate the activation of its translation equivalent (e.g., Bilson et al., 2015; Floccia et al., 2020; Jardak & Byers-Heinlein, 2019). That is, upon hearing the English word “apple,” the corresponding French word “pomme” is more easily activated in bilinguals’ minds. In vocabulary acquisition, bilingual children must learn a first label for a referent ( a “singlet”; Umbel et al., 1992) before they can learn its translation equivalent. Is translation equivalent learning different from singlet learning? The current paper contrasts three competing accounts: 1) translation equivalents are harder to learn than singlets (Avoidance Account), 2) translation equivalents are easier to learn than singlets (Preference Account), and 3) translation equivalents are similar to learn than singlets (Neutral Account). To adjudicate between these accounts, we introduce the Bilingual Vocabulary Model, which provides a computational account of vocabulary learning, with parameters including bilinguals’ vocabulary in each language and their developmental level. In Study 1, we use the Bilingual Vocabulary Model to derive a set of predictions, which we then test against vocabulary data from 200 18- to 33-month-old bilingual children in Study 2.
Early theories of bilingual development claimed that translation equivalents are conspicuously missing from bilingual children’s early vocabularies (e.g., Imedadze, 1978; Swain & Wesche 1975; Volterra & Taeschner, 1978). The phenomenon of missing translation equivalents led theorists to propose that young bilingual children do not differentiate their languages, and thus tend to learn only a single word for each referent. This avoidance of translation equivalents was thought to be due to word learning biases such as mutual exclusivity, whereby children assume that a referent is only associated with one word at the basic level (Markman & Wachtel, 1988; Markman, 1992; 1994). For example, when monolingual children see a familiar object (e.g., a cup) next to a novel object (e.g., a garlic press) and hear a novel word like “wug,” they assume that “wug” refers to the garlic press — the object unknown to them — rather than to the cup, the object for which they already know the word.
Although mutual exclusivity is helpful for monolingual vocabulary acquisition, its use is more complex for bilingual vocabulary acquisition (Byers-Heinlein & Werker, 2009; Davidson & Tell, 2005; Houston-Price, Caloghiris, & Raviglione, 2010). When encountering a potential singlet, mutual exclusivity would be equally useful for bilinguals as it is for monolinguals, supporting them in associating an unlabeled referent with a novel word. However, a strong form of mutual exclusivity might prevent bilinguals from associating a translation equivalent word with its referent, given that in this case the referent is already associated with another word (albeit in the other language). Thus, mutual exclusivity could prevent bilinguals from acquiring translation equivalents, leading to an abundance of singlets in their vocabularies.
Contrary to earlier studies, more recent work has indicated that bilinguals do understand and produce translation equivalents from early in development (David & Wei, 2008; De Houwer et al., 2006; Holowka et al., 2002; Pearson et al., 1995; Legacy et al., 2017). Indeed, experimental work has suggested bilingual experience in infancy might not support the development of one-to-one mapping biases such as mutual exclusivity, at least in early infancy. For example, when hearing a novel word like “nil,” monolingual children aged 17–22 months looked towards a novel object rather than a familiar object, but bi- and multilingual children looked similarly to both objects (Byers-Heinlein & Werker, 2009; 2013; Houston-Price et al., 2010). A recent meta-analysis also indicated that bilingual children show mutual exclusivity to a weaker degree than monolinguals (Lewis et al, 2020).
Overall, converging evidence refutes the position that a strong form of mutual exclusivity prevents bilinguals from acquiring translation equivalents. Nonetheless, it leaves open the possibility that translation equivalents may be less likely acquired in favour of learning singlets even if translation equivalents are not completely avoided. If bilingual children avoid lexical overlap across languages even to a small degree, then under the Avoidance Account translation equivalents would be harder to learn than singlets.
Contrary to the Avoidance Account, the Preference Account posits that translation equivalents are easier to learn than singlets. At a minimum, word learning requires encoding and representing the relevant sounds of a word, creating a mental representation of its referent, and linking the two. When a French–English bilingual child encounters the word “pomme” after having learned “apple,” one part of that process has already occurred in that the referent is already represented; because part of the word learning task is already accomplished, translation equivalents might therefore be easier to learn than singlets (e.g., Montanari, 2010; Poulin-Dubois et al., 2013; 2017). Moreover, research suggests that bilingual lexicons are not tightly encapsulated by language, but instead include cross-language mental links between words that are semantically related (e.g., Floccia et al., 2020; Jardak & Byers-Heinlein, 2018; Singh, 2013). In this context, the strong semantic overlap makes translation equivalents special, and could facilitate their acquisition (e.g., Bilson et al., 2015; Floccia et al., 2020). The Preference Account predicts that translation equivalents are more easily learned than singlets.
There are several lines of empirical evidence to support the Preference Account. For example, some early case studies reported that bilinguals tended to learn more translation equivalents than singlets when experiencing a shift in their language exposure that inverted their dominant and non-dominant languages (Lanvers, 1999; Pearson & Fernández, 1994). The main explanation that has been given for this finding is that additional exposure to their non-dominant language — which became their new dominant language — enabled fast mapping of words to already-lexicalized concepts.
Other evidence suggesting translation equivalents might be easier to learn than singlets comes from a study that included vocabulary-checklist data from 254 monolingual and 181 bilingual children aged 6 months to 7.5 years (Bilson et al., 2015). The researchers used a network analysis approach to investigate how translation equivalents are learned, focusing on the semantic relationships between the words (e.g., words like “cat” and “dog” are strongly semantically related). Using a statistical model that allowed free semantic relations among vocabulary data from monolingual and bilingual children, the results suggested that words were learned faster when they were semantically connected to more known words in children’s lexicons. This effect applied not only to words within the same language, but also to words across languages including translation equivalents (e.g., English “dog” and French “chien”) and words that had other cross-language relations (e.g., “cat” and “chien”). The authors then simulated bilingual vocabulaires by modeling bilingual lexicons as combinations of two independent vocabulary-size-matched monolinguals. Comparison with actual bilingual children’s vocabulary data revealed that bilingual children acquired more translation equivalents than predicted by the simulation. The authors therefore concluded that bilingual children learn translation equivalents more easily than singlets. Note that, in their study, expected translation equivalent knowledge was simulated based on the number of lexical items that overlapped between two randomly chosen English monolinguals (e.g., whether both monolinguals knew the word “cat”). However, it is unclear whether this is an appropriate point of comparison for bilingual children as this approach may overlook variables that impact bilinguals’ vocabulary learning including vocabulary size in each language and the developmental level of a child — a point that we will return to later in the introduction.
Overall, there is some evidence that bilingual children more readily learn translation equivalents than singlets. If the strong semantic overlap between translation equivalents facilitates their learning, then under the Preference Account translation equivalents will be more easily learned than singlets.
The previous accounts rely on the idea that bilingual vocabulary development unfolds differently than monolingual development, as monolinguals encounter only singlets but bilinguals encounter both singlets and translation equivalents. There is an underlying assumption that translation equivalent learning is somehow special relative to singlet learning — the Avoidance Account proposes that translation equivalents are harder to learn than singlets, whereas the Preference Account proposes that translation equivalents are easier to learn than singlets. However, it is also possible that translation equivalents are neither harder nor easier for bilingual children to learn than singlets. We call this the Neutral Account.
The Neutral Account implies that bilingual children’s two languages develop relatively independently. Indeed, language and processing measures for bilingual children tend to be tightly correlated within a particular language, with weakly if at all correlated across languages. For example, 30-month-old bilingual children’s processing efficiency in a particular language closely correlated with vocabulary size in that language, but was unrelated to vocabulary size in their other language (Marchman, Fernald, & Hurtado, 2010). Due to differences in the amount of language exposure, bilingual children seldom show equal vocabulary growth in both of their languages (e.g., Pearson & Fernández, 1994; Pearson et al., 1997), and the amount of exposure to a particular language has been reported to modulate the within-language association between language processing ability and vocabulary size (Hurtado et al., 2013). Bilingual children with greater exposure to a particular language tended to process that language faster, and in turn learned more words in that language.
In a study whose results support the Neutral Account, Pearson and colleagues (1995) randomly paired the single-language English lexicons from a subset of bilingual children to the single-language Spanish lexicons from another subset of bilingual children to derive a percentage of by-chance lexical overlaps shared between monolingual lexicons of two randomly paired children. The researchers found that the percentage of translation equivalents observed in English–Spanish bilingual children was similar to the by-chance percentage of translation equivalents between randomly-paired children. This evidence implied that singlets and translation equivalents are equally learnable. In sum, the Neutral Account predicts that translation equivalents are similar for bilingual children to learn as singlets.
The previous section discussed three theoretical accounts concerning the relative learnability of translation equivalents. However, to date, aspects of translation equivalent learning have mostly been examined in isolation, rather than integrated within the larger context of bilingual lexical development. In this section, we consider two proximal variables that we expect to predict the number of translation equivalents bilingual children know: vocabulary size in each language, and word learnability as a function of children’s developmental level.
Because translation equivalents are words from different languages that refer to the same concept, the number of words a bilingual knows in each of their languages will necessarily constrain the number of translation equivalent pairs they could possibly know. For example, a child with a less balanced vocabulary across the two languages might only say 5 words in one language but many more words in the other language; this means that the child could only produce a maximum of 5 translation equivalents, regardless of how many words they know in their other language. Conversely, it seems reasonable to expect that if a child knows a similar number of words in each language and thus has a more balanced vocabulary across the two languages, there would be more potential for some of those words to be translation equivalents.
Balance between the two vocabulary sizes is a function of the number of words bilingual children produce in each language, which tends to be tightly linked to their exposure to each language. In general, more language exposure leads to larger vocabulary size (e.g., Barnes & Garcia, 2012; Boyce et al., 2013; Hurtado et al., 2013; Marchman, Fernald, & Hurtado, 2010; Place & Hoff, 2011; Pearson et al., 1997). Bilingual children usually know more words in the language in which they have greater exposure (i.e; dominant language) relative to the language in which they have less exposure (i.e., non-dominant language; Pearson et al., 1997; Place & Hoff, 2011). This is because the more often a bilingual hears a language, the more opportunities there will be for learning new words in that language.
One important consideration in thinking about bilingual children’s experience is whether they encounter and use their languages within the same or different contexts. This is known as the Complementarity Principle (Grosjean, 2016). For example, for school-aged bilinguals, school-related words are more likely to be known in the language of schooling rather than in the home language (Bialystok et al., 2010). If certain words are encountered in particular contexts where only one language is used, bilinguals may have fewer opportunities to learn translation equivalents for these words. However, we argue that the Complementarity Principle is unlikely to strongly impact bilingual word acquisition in infancy. Most words in children’s early vocabularies could be considered “home words,” which include words for social and daily routines (“hello,” “more,” “diaper”), common nouns (“doggie”), and everyday verbs (“walk”). Such words are likely to be encountered across contexts where children spend the majority of their time, such as at home and at childcare. Thus, growing up in a bilingual context from birth, bilingual children presumably encounter “home words” in both languages. Accordingly, we assume that for the most part, bilingual children’s opportunities for learning words in each of their languages will be proportional to their overall exposure to the language, and largely not subject to the Complementarity Principle. We further consider implications of this assumption in the discussion section.
Finally, we must note that vocabulary acquisition is not solely tied to quantity of input, but is also predicted by a host of other factors such as children’s ability to segment words from the continuous stream of speech (e.g., Brent & Siskind, 2001; Swingley & Humphrey, 2018), children’s efficiency of processing words they hear (e.g., Hurtado et al., 2013; Weisleder & Fernald, 2013), cognitive development and perceptual bias (e.g., Benedict ,1979; Goodman et al., 2008), and family socioeconomic status (Fernald, Marchman, & Weisleder, 2013). Nonetheless, all else being equal, words that are encountered more frequently are acquired sooner than those encountered less frequently (Brent & Siskind, 2001; Goodman et al., 2008; Swingley & Humphrey, 2018).
An often overlooked factor that could contribute to bilingual children’s learning of translation equivalents is related to the changes in the learnability of different words over time based on children’s developmental level. Evidence from monolingual children shows that some types of words are characteristically learned before others. For example, across many languages including English, children show a noun bias in their early lexicons (Braginsky et al., 2019; Goodman et al., 2008), although for other languages such as Mandarin it appears that verbs and nouns are more equally acquired (Tardif, 1996). Certain classes of words are rarely known at the onset of lexical development, such as prepositions and words for time (Fenson et al., 2007). This is thought to be due to the cognitive and linguistic machinery that must be in place in order for children to represent these concepts, a necessary prerequisite for learning certain word types (Bergelson, 2020; Braginsky et al., 2019). If this is the case, then children might be more likely to learn translation equivalents than singlets, simply because translation equivalents are more likely to be learnable at their stage of development. That is, potential singlets might be “too hard” to be learned at a particular age. Thus, a seeming overabundance of translation equivalents might be a product of developmental constraints on word learning, rather than due to semantic facilitation.
Taking into account the contributions of language exposure and developmental level to bilingual children’s vocabulary acquisition, we put forward the Bilingual Vocabulary Model. This model proposes that the number of translation equivalents that bilingual children produce is a function of vocabulary learning in each language, in the context of the number of potentially learnable words given the children’s developmental level. We formalize learning a translation equivalent pair as the joint probability of learning each of the words in the pair. This provides a straight-forward empirical test of different theoretical accounts of translation equivalent learning, by asking whether or not the probability of knowing a word is independent of knowing its translation equivalent. The logic is similar to that of the familiar chi-squared test for independence, where the independence of two events from the same population is tested as the probability of their intersection computed by multiplying the probability of each individual event: P(A and B) = P(A) × P(B | A) where P(B | A) = P(B) if A and B are independent (see Box 1 for a detailed example). The full model is shown in Figure 1. In the next paragraphs, we define each of the model parameters in detail, and these are also summarized in Table 1.
Illustration of the Bilingual Vocabulary Model.
The model takes four main parameters: the number of words produced in the dominant language (DOM), the number of words produced in the non-dominant language (NONDOM), vocabulary size of potentially learnable words in each language (LEARNABLE), and a bias parameter (BIAS) which indicates whether the model is biased towards (BIAS > 1) or against (BIAS < 1) learning translation equivalents. The language in which a child knows more words is the dominant language, whereas the one in which a child knows fewer words is the non-dominant language. Next, we turn to the LEARNABLE parameter (i.e., the number of potentially learnable words). If DOM and NONDOM are measured with an instrument such as the MacArthur-Bates Communicative Development inventories (CDI; Fenson et al., 2007), one option would be to set LEARNABLE to be the total number of items on the CDI. For convenience, consider the effect of setting LEARNABLE to 600, as a round number (the actual number of CDI items is usually slightly higher than 600, depending on the language of the adaptation). Very young children would not be expected to know many of the “harder” words on the CDI, such as “lawn mower,” “sidewalk,” or “vitamins’’, due to children’s immature cognitive machinery and conceptual development. A more reasonable solution might be to determine how many CDI words are potentially learnable given the child’s developmental level, which could be approximated by their age. For example, imagine that Jamie who is 18 months old produces 50 English words and 20 French words, thus a total of 70 words. Monolingual children his age with the very largest productive vocabularies (those at the 90th percentile averaging between English and French norms) produce a total of 245 words (retrieved from the Wordbank database; Frank et al., 2016). Although there is likely considerable individual variability as to the cognitive capacity even amongst children of the same age, we argue that this provides a reasonable — if imperfect — estimate of the number of learnable words (LEARNABLE) that a child of Jamie’s age could potentially acquire in each language. Thus, we might expect that Jamie could potentially have learned up to 245 English words and 245 French words, although he has thus far only learned 50 in English and 20 in French.
Using the mathematical concept of independence, we can then quantify the number of translation equivalents (TE) expected given children’s vocabulary sizes in the dominant (DOM) and non-dominant (NONDOM) languages, as well as the number of potentially learnable words (LEARNABLE). If dominant-language and non-dominant-language words are learned independently from each other, we multiply DOM × NONDOM (the number of words known in the dominant and non-dominant language respectively), and divide by the total population of learnable words in one language (LEARNABLE) — which is the possible number of words that could overlap across both languages — to predict the number of translation equivalents. We further introduce the bias parameter (BIAS), which allows us to examine whether translation equivalent learning is best described by the Avoidance, Preference, or Neutral account. Adding this parameter, translation equivalents can be derived from TE = BIAS × (DOM × NONDOM)/LEARNABLE. For the Avoidance Account, BIAS will be less than 1, meaning that TEs are less easily learned than singlets; for the Preference Account, BIAS will be greater than 1, meaning that translation equivalents are more easily learned than singlets; for the the Neutral Account, BIAS is exactly 1 (i.e., the model is unbiased with respect to whether translation equivalents are more difficult or easier to acquire than singlets). Going back to the example of 18-month-old Jamie, we would set the denominator at 245 which is the number of potentially learnable words at 18 months. If translation equivalents are half as easy to learn as singlets (following the Avoidance Account), we would expect Jamie to produce .5×(50×20/245) = 2.0 translation equivalents. Conversely, if translation equivalents are twice as easy to learn as singlets (following the Preference Account), we would expect Jamie to produce 2×(50×20/245) = 8.2 translation equivalents. Under the Neutral Account, we would expect Jamie to learn 1×(50×20/245) = 4.1 translation equivalents.
Finally, based on the main parameters, we can calculate additional, commonly-reported descriptors of bilingual vocabulary, which we detail below and describe as derived parameters.
Balance of vocabulary (BALANCE) is the proportion of total words that children produce in each language. For convenience, balance is defined in reference to the non-dominant language with the formula NONDOM/(DOM+NONDOM), such that scores can range from 0.0 (completely unbalanced) to 0.5 (completely balanced). For example, since 18-month-old Jamie produces 50 dominant vocabulary words and 20 non-dominant vocabulary words, he would have a balance score of 0.29. Note that this calculation does not take into account overlap in meaning across the two languages (i.e., how many of the words he produces are translation equivalents).
Word vocabulary (WORD; sometimes called total productive vocabulary) is the total number of words that a child produces across the two languages, calculated as the sum of the dominant vocabulary (DOM) and non-dominant vocabulary (NONDOM). Concept vocabulary (CONCEPT; sometimes called total conceptual vocabulary) is the number of concepts that are lexicalized by the child — that is, the total number of concepts that are lexicalized in either language. This can be calculated by subtracting the number of translation equivalents (TE) from the word vocabulary (WORD). Finally, we can also calculate singlets that are produced in each language, that is words for which the child does not yet produce a translation equivalent. Singlets in the dominant language (DOM-SINGLET) can be calculated by subtracting translation equivalents (TE) from dominant-language vocabulary (DOM); singlets in the non-dominant language (NONDOM-SINGLET) can be calculated by subtracting translation equivalents (TE) from non-dominant language vocabulary (NONDOM). It is also possible to decompose children’s word vocabulary (WORD) into the sum of TE, DOM-SINGLET, and NONDOM-SINGLET.
The current research aimed to better understand the nature of translation equivalent learning in bilingual children. Study 1 simulated the expected patterns of translation equivalent learning under the Bilingual Vocabulary Model proposed in the introduction, with reference to the proportion of words learned in the dominant and non-dominant language and the number of words that are learnable at various developmental levels. We also compared predicted learning outcomes for when translation equivalents are harder to learn, or easier to learn, or similar to learn than singlets.
In Study 2, we examined real-world translation equivalent development in light of the predictions from the Bilingual Vocabulary Model, using archival data from 200 French–English bilingual children aged 18 to 33 months, whose vocabularies and translation equivalent knowledge were measured by parent report using the MacArthur-Bates CDI: Words and Sentences form in English (Fenson et al., 2007) and Québec French (Trudeau et al., 1997). Together, the Bilingual Vocabulary Model and real-world data allowed us to examine contrasting hypotheses about translation equivalents: whether translation equivalents learning is harder (Avoidance Account), whether translation equivalent learning is easier (Preference Account), or similar to learn than singlets (Neutral Account).
Study 1 provides a computational implementation of the Bilingual Vocabulary Model outlined in the introduction (see also Figure 1), which we use to simulate different scenarios to examine the effect of vocabulary sizes and developmental variables on translation equivalent learning. Note that usually only three values are necessary to calculate all the other variables (see Table 1). Most commonly, we can calculate other variables based on the total number of learnable words (LEARNABLE) together with either the words known in each language (DOM and NONDOM) or word vocabulary plus balance (WORD and BALANCE) which allow us to compute DOM and NONDOM. It is also possible to calculate other variables based on the total number of learnable words (LEARNABLE) with balance and words known in either language (BALANCE and DOM or BALANCE and NONDOM).
Three simulations were generated to explore expected patterns of translation equivalent learning under the Bilingual Vocabulary Model. In the first simulation, we examined how translation equivalent learning relates to vocabulary balance (BALANCE), as well as different metrics of vocabulary size, including dominant-language vocabulary (DOM), non-dominant language vocabulary (NON-DOM), and word vocabulary (WORD). In the second simulation, we explored relationships between translation equivalents (TE), balance (BALANCE), and learnable words (LEARNABLE). In the first two simulations, the BIAS parameter was held constant at 1 (Neutral Account); in the third simulation, we varied the bias parameter (BIAS) to compare translation equivalent learning under the Avoidance, Preference, and Neutral Accounts. A summary of the parameter values used in each simulation is provided in Table 2.
In Simulation 1, we first illustrate the relationships between different variables in the model by simulating three hypothetical children who are at the same developmental level and thus have the same number of potentially learnable words (LEARNABLE), but with different word vocabularies (WORD) and BALANCE. For convenience, we set LEARNABLE = 600 in this example, which roughly corresponds to what is expected for an English-learning 26 month-old (i.e., the most verbal 26-month-old English-learner at the 90th percentile of vocabulary produces around 600 words as retrieved from the Wordbank database; Frank et al., 2016). We set BIAS to 1, meaning that in these examples translation equivalents are similarly easy to learn as singlets.
We first illustrate with three hypothetical children. Infant Annie (small vocabulary, unbalanced exposure) produces 270 words in the dominant language and 30 words in her non-dominant language. She has a word vocabulary of 300, and a balance score of .10 (10% of her words are in the non-dominant language). Based on the formula TE = DOM×NONDOM/LEARNABLE (we drop BIAS from the formula since it is 1 here) and as seen in Table 3, Annie is expected to produce 13.5 translation equivalents. Infant Bernie (small vocabulary, balanced exposure) produces 180 dominant-language words, and 120 non-dominant language words. Like Annie, he has a word vocabulary of 300, but he has a higher balance score of .40 (40% of his words are in the non-dominant language). Based on our formula, we expect Bernie to produce 36 translation equivalents. Comparing Annie and Bernie, two children who produce the same word vocabulary (i.e., WORD is held constant), the child with more balanced language vocabulary (Bernie) is expected to produce more translation equivalents. Like Bernie, infant Charlie also has a balanced vocabulary, but has a larger word vocabulary (WORD), producing 540 words in the dominant language (DOM) and 360 in the non-dominant language (NONDOM) for a total of 900 words (WORD), and thus BALANCE = .40. Based on our formula for Simulation 1, we expect Charlie to produce 324 translation equivalents (TE). Infants Bernie and Charlie illustrate that for two children equal in BALANCE, the child with larger word vocabulary (WORD) is expected to produce more translation equivalents (TE). Other vocabulary metrics are calculated for each hypothetical child as described in Table 3.
We then broadened this simulation to the more general case and examined patterns of translation equivalent learning, where simulated children had the capacity to learn 600 words (LEARNABLE held constant at 600), and their vocabulary size in each language (DOM and NONDOM) varied. BIAS was once again constant at 1. Data from a total of 216 simulated children were generated (see Table 2 for a summary of the parameter values used in this simulation). Based on these values, we derived simulated children’s word vocabulary (WORD, calculated as DOM+NONDOM) and their vocabulary balance (BALANCE, calculated as NONDOM/(DOM+NONDOM)). In Figure 2, we plotted TE knowledge as a function of DOM, NONDOM, and WORD at different levels of BALANCE. Across all three Panels (1A, 1B, and 1C), simulated children with the most balanced vocabulary consistently produced more translation equivalents than other children. Moreover, Panels 1A and 1C show that, as the number of DOM (dominant language words) and WORD (word vocabulary) increased, TE also increased regardless of BALANCE. Interestingly, Panel 1B shows that NONDOM and TE were extremely tightly coupled. In sum, we observed three important patterns, which served as Prediction Set 1 from the Bilingual Vocabulary Model for Study 2:
Number of translation equivalents (TE) across different levels of vocabulary balance (BALANCE) in relation to dominant vocabulary size (DOM; Panel A), non-dominant vocabulary size (NONDOM; Panel B), and word vocabulary (WORD; Panel C). Row 1 represents the simulated data in Study 1 while holding the number of learnable words (LEARNABLE) constant at 600 and BIAS constant at 1. Row 2 represents the observed vocabulary data in Study 2.
In our previous simulation, we assumed that each simulated child was at the same developmental level and had the capacity to learn up to 600 words in each language (i.e., LEARNABLE held constant at 600). As laid out in the introduction, under the Bilingual Vocabulary Model, the learnability of different words changes with a child’s developmental level, where LEARNABLE increases as a child grows older. Therefore, in Simulation 2, we looked at the expected patterns of translation equivalent learning across varying levels of LEARNABLE (i.e., the number of learnable words in each language as developmental level changes). Additionally, we further examined vocabulary composition by computing the number of singlets in the dominant (DOM-SINGLET) and non-dominant (NONDOM-SINGLET) language. BIAS was once again kept constant at 1.
Translation equivalent knowledge was simulated across children at three developmental levels (the number of LEARNABLE words = 300, 450, 600), in conjunction with a wide range of values for words in the dominant language (DOM) and the non-dominant language (NONDOM). In total, data from 161 simulated children were generated (see Table 2 for a summary of the parameters used in this simulation). Again, balance (BALANCE) was calculated based on the values of DOM and NONDOM. We also calculated the number of singlet words in the dominant (DOM-SINGLET) and non-dominant (NONDOM-SINGLET) languages, so that simulated children’s concept vocabulary (CONCEPT) could be decomposed as the sum of TE (translation equivalents), DOM-SINGLET, and NONDOM-SINGLET. Figure 3 plots this decomposition for simulated children of different developmental levels, with vocabulary ranging from most balanced (BALANCE = .35 - .50), to medium balanced (BALANCE = .20 - .35), to least balanced (BALANCE = .00 - .02).
Number of translation equivalents (TE) and singlets in dominant (DOM-SINGLET) and non-dominant language (NONDOM-SINGLET) across different developmental levels/ages, which sets the number of LEARNABLE words. Panel A represents the model simulation in Study 1, where developmental levels of simulated children are set at three values: LEARNABLE = 300, 450, and 600. Panel B represents the observed vocabulary data in Study 2, where developmental level was divided into 3 subsets with children of 18-22 months (left), children of 23-27 months (middle), and children of 28-33 months (right). Proportion of balance (BALANCE) was divided into three groups, where the least balanced group had a range of .00 - .20 vocabulary balance, the medium balanced group had a range of .20 - .35, and the most balanced group had a range of .35 - .50.
In general, simulated children at a later developmental level had larger concept vocabularies (CONCEPT). Moreover, we continued to observe a pattern reported in prediction 1a, whereby simulated children with more balanced vocabularies produced more translation equivalents (TE). Moreover, regardless of balance, simulated children at later developmental levels (i.e., older children with more potentially LEARNABLE words) acquired more translation equivalents (TE). Overall, we generated 3 additional predictions (Prediction Set 2) made by the Bilingual Vocabulary Model. Compared to children at an earlier developmental level (i.e., younger infants with fewer potentially learnable words), children at a later developmental level (i.e., older infants with more potentially learnable words) will
In Simulations 1 and 2, we modeled cases in accordance to the Neutral Account where dominant-language and non-dominant language words were learned independently, such that the bias parameter (BIAS) was exactly 1 when we calculated TE as DOM×NONDOM/LEARNABLE. In our final simulation, we examined cases where dominant-language and non-dominant language words were not independent, corresponding to the Avoidance Account and the Preference Account. Mathematically, this requires varying the BIAS parameter. For the Preference Account, BIAS will be greater than 1, meaning that TEs are more easily learned than singlets. On the other hand, for the Avoidance Account, BIAS will be less than 1, meaning that TEs are less easily learned than singlets.
Translation equivalent (TE) knowledge was first simulated across different developmental levels (as indicated by number of LEARNABLE words = 150, 300, 450, 600), in conjunction with a wide range of values for DOM and NONDOM. Again, BALANCE and word vocabulary (WORD) were calculated based on the values of DOM and NONDOM. The final simulated data set contained 166 data points (see Table 2 for a summary of the parameters used). Three scenarios of translation equivalent learning (TE) were then generated using the formula TE = BIAS × DOM×NONDOM/LEARNABLE. To illustrate the Avoidance Account, BIAS was set at .5 (i.e., TEs are 50% less likely to be learned than singlets). To illustrate the Neutral Account, BIAS was set at 1 (i.e., TEs are equal to learn as singlets). Finally, to illustrate the Preference Account, BIAS was set at 1.5 (i.e., TE are 50% more likely to be learned than singlets). In Figure 4, we illustrate the three different scenarios of simulated translation equivalent (TE) knowledge. Again, we continue to observe a pattern consistent with prediction 1a where, in all cases, simulated children with more balanced vocabularies (BALANCE) produced more translation equivalents (TE). Thus, overall relationships between BALANCE and TE remained similar across the Avoidance, Preference, and Neutral Accounts. What changed was the slope of translation equivalent learning: the slopes were the shallowest under the Avoidance Account where BIAS = 0.5, whereas the slopes were steepest under the Preference Account where BIAS = 1.5. With this, we further outline Prediction Set 3:
Different scenarios of expected translation equivalents learning (TE) as a function of WORD vocabulary, under scenarios where TEs are harder to learn (BIAS < 1), easier to learn (BIAS > 1), or similar to learn (BIAS = 1) as singlets.
In Study 1, we used a simulation based on the Bilingual Vocabulary Model to generate several predictions about the relationship between translation equivalent knowledge and other vocabulary variables. In Study 2, we tested these predictions using archival vocabulary data from 200 French–English bilingual children aged 18 to 33 months.
Ethics approval was obtained by the Human Research Ethics Board of Concordia University (Certification Number 10000439) and informed consent was obtained from the children’s parents.
Archival data from 200 bilingual children acquiring English and French (age range: 18.38 - 33.5 months; 94 girls and 106 boys) who participated in prior studies at the XYZ lab were included in the present study, drawn from the same set of participants as Gonzalez-Barrero et al. (2020). Some children took part in more than one in-lab study (n = 28); thus, they contributed data at more than one time point. This resulted in a larger number of datapoints relative to the number of unique participants. The total number of data points included in the analyses was 229 (i.e., 229 English and 229 French CDI questionnaires). Participants were recruited through government birth lists, online ads, daycares, and infant-parent group activities (e.g., children’s library activities). Inclusion criteria were the following: full term-pregnancy ( i.e., > 36 weeks of gestation), normal birth weight (> 2500 grams), and absence of major medical conditions (i.e., meningitis). Only children who had complete data in both CDI forms (i.e., English and French) were retained for analysis. Bilingual children were defined as those exposed at least 25% of the time over the course of their lives globally to both English and French and with less than 10% of exposure to a third language. For children who participated more than once, their language exposure followed such criteria for all visits. Following the approach in Study 1, children’s dominant language was deemed to be the language in which the child produced a greater number of words; vocabulary balance was then determined based on the proportion of words produced in the non-dominant language relative to the total words produced across both languages using the same formula as in Study 1: NONDOM/(DOM+NONDOM). Within the 229 data points, 59.83% of children were English-dominant and 40.17% were French-dominant. Data collection was conducted in Montréal, Québec, Canada. Montréal is a multicultural city where both English and French are widely used in society. Children’s demographic characteristics including age, maternal education, and language exposure, are presented in Table 4.
| Mean | \(SD\) | Range | |
|---|---|---|---|
| Age in months | 24.4 | 4.7 | 18.4 - 33.5 |
| Maternal education in years | 16.6 | 2.1 | 10 - 21 |
| % Global exposure to English | 51.7 | 14.8 | 25 - 75 |
| % Global exposure to French | 47.8 | 15.0 | 25 - 75 |
| % Global exposure to Other | 0.6 | 1.8 | 0 - 10 |
Bilingual children’s expressive vocabulary was measured by the Words and Sentences form of the MacArthur-Bates CDI. Caregivers completed the original CDI English version (Fenson et al., 2007) and its Québec French adaptation (Trudeau et al., 1997). We asked the caregiver more familiar with each language to complete the respective CDI form, and the forms are mainly filled out by mothers (64%), fathers (7%), both parents (4%), others (< 1%; e.g., grandmother), and respondent not indicated (24%). In some cases different caregivers filled out each form, while in other cases the same caregiver filled out both forms. Our analyses focused on the vocabulary checklist of this questionnaire, which includes different nouns, verbs, adjectives, and other words used by young children. There are 680 words in the English CDI version and 664 in the Québec French version.
Translation equivalents (TE) were determined in the same manner as Gonzalez-Barrero et al. (2020) by three proficient bilingual French–English adults who carefully examined each language version of the CDI. Word pairs that made reference to the same concept (e.g., English “apple” and French “pomme”) were considered to be translation equivalents. In cases of disagreement, a discussion of the likely uses of the word in question by children (rather than potential adult uses of the word) was conducted and then a decision was made. Words that had similar phonetic realizations (e.g., English “alligator” and French “alligator”) were also considered translation equivalents. Most of the items on both vocabulary checklists had an equivalent word in the other language, which resulted in a total of 611 translation equivalents. A full list of translation equivalents is available at [https://osf.io/2t5kw/].
After determining the dominant language of a child based on the vocabulary size, we then computed the number of singlets that children knew in their dominant (DOM-SINGLET) and non-dominant (NONDOM-SINGLET) languages by deducting the number of translation equivalents produced from the total number of words produced in each language (i.e., DOM - TE and NONDOM - TE as in Study 1). Concept vocabulary (CONCEPT) was computed based on the number of concepts for which a child produced a word, calculated by subtracting the number of translation equivalents from word vocabulary (i.e., WORD - TE as in Study 1).
Children’s language exposure was measured using the Language Exposure Questionnaire (LEQ; Bosch & Sebastián-Gallés, 2001) and the Multilingual Approach to Parent Language Estimates (MAPLE; Byers-Heinlein et al., 2018). The LEQ is a structured interview that lasts approximately 15 minutes. It includes targeted questions that quantify the child’s language exposure from birth until their current age. The LEQ and MAPLE provide a global language exposure estimate based on the number of hours the child is exposed to each language within all contexts (e.g., home, daycare, etc.). Children’s average global exposure to each language is described in Table 4.
Caregivers were asked to fill out the CDI questionnaires as part of their child’s participation in experimental studies on language development, speech perception, and word learning. Caregivers were instructed to check off the words produced by their child using either a CDI paper questionnaire or the same questionnaire administered on a tablet. Data from paper based questionnaires was double entered and checked by trained research assistants.
Data analyses were conducted using R (Version 4.0.2, 2020). Analysis scripts and the data set used in the present study are available at [https://osf.io/2t5kw/]. We first present descriptive measures of vocabulary, and then tests of the three sets of predictions generated in Study 1.
On average, bilinguals in the sample had a mean word vocabulary size (WORD) of 295 (SD = 254.6), with a wide range of 6 - 1071 words. As expected by the way language dominance was defined, children produced more words in their dominant language (DOM; M = 206.1, SD = 175.6, range = 4 - 657) than in their non-dominant language (NONDOM; M = 88.9, SD = 98.5, range = 2 - 469), t(228) = 13.8888437, \(p < .001\), \(d = 0.92\).
Children produced an average of 67.7 translation equivalents (TE; SD = 85.1, range = 1 - 409). The remainder of words were singlets: Children produced many more singlets in their dominant language (DOM-SINGLET; M = 138.4, SD = 124.4, range = 2 - 523) than in their non-dominant language (NONDOM-SINGLET; M = 21.2, SD = 20.1, range = 0 - 94), t(228) = 13.8888437, \(p < .001\), \(d = 0.92\). On average, children’s concept vocabulary size was 227.3 (CONCEPT; SD = 181.3, range = 4 - 695).
Vocabulary balance (BALANCE) was then determined based on the proportion of total words produced in the non-dominant language following the formula BALANCE = NONDOM/WORD as in Study 1. On average, bilingual children in our sample had a balance score BALANCE of 0.3083577 (SD = 0.1250181), ranging from 0.0238095 to 0.496063. Similar vocabulary balance was found between the children who were English-dominant and those who were French-dominant, t(200.43) = 0.57, \(p = .566\), \(d = 0.08\). The 59.8% of children who were English-dominant had an average BALANCE of 0.31 (SD = 0.13, range = 0.02 - 0.5) whereas the remaining 40.17% who were French-dominant had an average BALANCE of 0.3 (SD = 0.12, range = 0.05 - 0.5).
Note that in this paper, we defined BALANCE in terms of relative vocabulary in each language, but for young bilinguals balance can also be considered in terms of input in each language. We therefore compare the vocabulary balance with the proportion of exposure bilingual children received in their non-dominant language. To make values comparable, the language designated as DOM and NONDOM was based on vocabulary-defined dominance, rather than the language that children heard most and least often. For most children, the language in which they produced the most words was also the language that they heard most often (181 children, 79.04%), although this was not the case for some children (48 children, 20.96%). The correlation between vocabulary-defined BALANCE and the raw percentage of exposure to the non-dominant language was moderate, r(227) = 0.45, \(p < .001\) (see also Figure 5). Thus, these two constructs were related, although not identical.
Correlation between balance defined by vocabulary (BALANCE) and balance defined by exposure.
Prediction Set 1 pertained to the pairwise relationships between word vocabulary (WORD), dominant (DOM) and non-dominant vocabulary (NONDOM), vocabulary balance (BALANCE), and translation equivalents (TE), which we examined through Pearson’s correlations. Overall, the univariate statistics showed strong correspondence with the relationships predicted by Prediction 1 under the Bilingual Vocabulary Model (see Table 5 for a full table of pairwise correlations).
Prediction 1a was that children with more balanced vocabularies would produce more translation equivalents. As shown in Figure 2 Row 2, our vocabulary data confirmed the prediction, r(227) = 0.25, \(p < .001\), where children with the most balanced vocabulary produced the most translation equivalents. We further tested this prediction by dividing children into 5 balance subset groups (0 < BALANCE ≤ 0.1, 0.2, 0.3, 0.4, and 0.5), and a one-way ANOVA revealed a significant effect of BALANCE, F(4, 224) = 3.6135321, \(p = .007\). The children with a BALANCE score of 0.5 (i.e., with more balanced vocabulary) produced the most translation equivalents, whereas children with a BALANCE score of 0.1 (i.e., with less balanced vocabulary) produced the least translation equivalents. Detailed descriptive statistics are reported in Table 6.
Prediction 1b was that children with larger word vocabularies and larger dominant-language vocabularies would produce more translation equivalents, and the results from our dataset confirmed this prediction, for word vocabulary (WORD): r(227) = 0.9, \(p < .001\), and dominant-language vocabulary (DOM): r(227) = 0.76, \(p < .001\). Figure 2 Row 2 further illustrates these relationships observed in our dataset.
Prediction 1c was that children who produce more words in the non-dominant language (NONDOM) would produce more translation equivalents (TE), specifically that this relationship would be nearly perfect. As shown in Figure 2 Row 2, we observed that these two variables were indeed nearly perfectly correlated, r(227) = 0.99, \(p < .001\).
| Age (in month) | LEARNABLE | BALANCE | WORD | DOM | NONDOM | TE | DOM-SINGLET | NONDOM-SINGLET | |
|---|---|---|---|---|---|---|---|---|---|
| Age (in month) | 0.95**** | -0.24*** | 0.65**** | 0.69**** | 0.45**** | 0.48**** | 0.65**** | 0.19** \ | |
| LEARNABLE | -0.23*** | 0.62**** | 0.65**** | 0.43**** | 0.44**** | 0.62**** | 0.21** \ | ||
| BALANCE | -0.07 \ | -0.29**** | 0.35**** | 0.25**** | -0.58**** | 0.63**** | |||
| WORD | 0.96**** | 0.87**** | 0.90**** | 0.74**** | 0.44**** | ||||
| DOM | 0.70**** | 0.76**** | 0.89**** | 0.23*** | |||||
| NONDOM | 0.99**** | 0.31**** | 0.72**** | ||||||
| TE | 0.38**** | 0.60**** | |||||||
| DOM-SINGLET | -0.09 \ | ||||||||
| NONDOM-SINGLET | |||||||||
| CONCEPT |
Note. *** p < .001, ** p < .01, * p < .05.
Prediction Set 2 pertained to expected patterns of acquisition of translation equivalents and singlets for children of different developmental levels. In our data set, developmental level was approximated by children’s age. Figure 3 Panel B shows the concept vocabulary (CONCEPT) of the bilingual children as a function of different ages (a proxy for developmental level), used to estimate the number of LEARNABLE words. To illustrate the acquisition of translation equivalents and singlets at different developmental levels, we divided children into three age groups: younger children of 18–22 months, middle children of 23–27 months, and older children of 28–33 months.
Prediction 2a was that older children (i.e., those at a later developmental level) would have larger concept vocabularies than younger children (i.e., those at an earlier developmental level). We observed a positive correlation between age (used as a proxy for developmental level, which determines LEARNABLE) and concept vocabulary (CONCEPT) in our dataset, r(227) = 0.69, \(p < .001\), and therefore confirmed the prediction. This pattern was further confirmed by a one-way ANOVA, where the three age groups significantly differed in the number of concept vocabulary they produced, (\(F\)(2, 226) = 90.86, \(p < .001\)). Older children of 28–33 months (i.e., at a later developmental level) produced the most with an average concept vocabulary of 414.6 (\(p\)s < .001), those middle children of 23–27 months (i.e., at an intermediate developmental level) produced an average concept vocabulary of 252.1, and those younger children of 18–22 months (i.e., at an earlier developmental level) produced the least with an average concept vocabulary of 119.9 (\(p\)s < .001).
Prediction 2b was that older children would produce more translation equivalents than younger children. First, we observed a positive correlation between age (our proxy for LEARNABLE) and number of translation equivalents in our dataset, r(227) = 0.48, \(p < .001\), and therefore confirmed the prediction. In a one-way ANOVA with age group as factor, we further found that groups differed in how many translation equivalents they produced (\(F\)(2, 226) = 31.74, \(p < .001\)). Younger children of 18–22 months produced an average of 33.7 translation equivalents, middle children of 23–27 months produced an average of 71.1 translation equivalents, and older children of 28–33 months produced an average of 131.4 translation equivalents (\(p\)s < .01).
Prediction 2c was that both older children and those with the least balanced vocabularies (BALANCE) would produce more dominant-language singlets (DOM-SINGLET). This pattern was confirmed by the results from our dataset, with a positive correlation between dominant-language singlets (DOM-SINGLET) and age (which determined LEARNABLE), r(227) = 0.65, \(p < .001\), and a negative correlation between BALANCE and dominant-language singlets (DOM-SINGLET), r(227) = -0.58, \(p < .001\). As shown in Figure 3 Panel B, children were divided into least balanced (range of balance: .00 - .20), medium balanced (range of balance: .20 - .35) and most balanced (range of balance: .35 - .50) groups (i.e., the same criteria as in Figure 5). In a one-way ANOVA with balance group as a between-subjects factor, we observed that the least balanced children produced the most singlets in their dominant language (\(p\)s < .001), with the least balanced, medium balanced, most balanced children producing respectively: 255.5, 141.5, and 71.9 words in their dominant language (\(F\)(2, 226) = 50.77, \(p < .001\)).
Prediction 2d was that older children and those with the most balanced vocabularies (BALANCE) would produce more singlets in their non-dominant language. This pattern was also observed in our dataset, with a positive correlation between the number of non-dominant singlets (NONDOM-SINGLET) and age (which determined LEARNABLE), r(227) = 0.19, \(p = .005\), and a positive correlation between BALANCE and the number of non-dominant singlets (NONDOM-SINGLET), r(227) = 0.63, \(p < .001\). In a one-way ANOVA with balance group as a between-subjects factor, we confirmed that children who differed in how balanced their vocabulary knowledge was also differed in how many singlets they produced in their non-dominant language (\(F\)(2, 226) = 61.89, \(p < .001\)). As shown in Figure 3 Panel B, we observed that children produced very few singlets in their non-dominant language, although the most balanced children produced the most singlets in their non-dominant language (mean of the most balanced children = 35.1 > mean of the medium balanced children = 15.5 > mean of the most balanced children = 5.7; \(p\)s < .001).
Prediction Set 3 pertained to the overall nature of translation equivalent learning, describing expected patterns of translation equivalent learning under the Neutral Account, the Avoidance Account, or the Preference Account. To directly test the correspondence of our data with these different accounts, we built a linear regression model predicting the observed number of translation equivalents from the Bilingual Vocabulary Model using the formula TE = DOM×NONDOM/LEARNABLE , and we allowed the model to estimate BIAS parameter.
First, we will walk through the parameters in this model. The size of dominant vocabulary (DOM) and size of non-dominant vocabulary (NONDOM) were taken to be the number of words produced by individual children observed in the vocabulary data. As for the number of learnable vocabulary (LEARNABLE), this was determined by the averaging of English and French productive CDI vocabulary at the 90th percentile at different ages which was obtained from Wordbank (Frank et al., 2016), and Table 7 lists the denominator at different ages. For example, for an 18 month-old infant, the denominator was 244.9 words which was calculated by averaging the 268.7 English words and 221.1 French words, based on what 18-month-old children would typically produce at the 90th percentile. For children who were between 31 to 33 months in our dataset, the 90th percentile of 30-month-old children was used since the 90th percentile information was available only up to 30 months.
Furthermore, the intercept of the linear regression model was set at 0 since no translation equivalents are expected to be produced if a child does not know any dominant or non-dominant vocabulary (i.e., when the predictor variables are 0). To reproduce the Bilingual Vocabulary Model’s formula TE = DOM×NONDOM/LEARNABLE, an interaction between dominant and non-dominant vocabulary was entered in the model, but main effects were not included in the model (denoted in R by using a colon rather than an asterisk between the interacting predictors). Therefore, our final linear regression model equation was:
ObservedTE ~ 0 + Dominant vocabulary:Non-dominant vocabulary/90 percentile of CDI items.
(In R language, the model was entered as:
ObservedTE * 90 percentile of CDI items ~ 0 + Dominant vocabulary:Non-dominant vocabulary)
With the observed number of translation equivalents as the dependent variable, the regression coefficient estimated by the model would indicate how the BIAS parameter was consistent with the empirical vocabulary data, which would then indicate whether bilingual children were biased towards or against learning translation equivalents. If the coefficient is close to 1, then there is no bias and translation equivalents are learned equally to other words (i.e., the Neutral Account). Otherwise, a coefficient less than 1 represents a bias against learning translation equivalents where translation equivalents are less easily learned (i.e., the Avoidance Account), and a coefficient greater than 1 represents a bias towards learning translation equivalents where translation equivalents are more easily learned (i.e., the Preference Account).
Our model showed an excellent model fit of \(R^2\) = 0.96, indicating that our model explained 96% of the variance in bilinguals’ translation equivalent knowledge. The linear regression model estimated a BIAS coefficient of 1.02, \(p < .001\). This value is extremely close to 1, suggesting that our data are consistent with the account whereby translation equivalents are learned equivalently to other words.
To illustrate the close fit between the Neutral Account and our data, we used the Bilingual Vocabulary Model formula TE = 1 × (DOM×NONDOM/LEARNABLE) to estimate each child’s expected translation equivalent knowledge (setting BIAS = 1), which is plotted against our observed data in Figure 6. Expected and observed translation equivalents were closely aligned with the Neutral Account of the Bilingual Vocabulary Model (i.e., BIAS = 1), suggesting that the Neutral Account provides a parsimonious explanation for bilinguals’ translation equivalent knowledge. This provides evidence for the notion that translation equivalents are neither harder nor easier to learn than singlets in bilingual vocabulary learning. Note that visual inspection suggested that there could be some possible outliers. Cook’s distance was estimated for our linear regression model listed above and identified two data points with a cook’s distance over 0.4. After removing those two data points, the linear regression model returned a coefficient of 1.05, \(p < .001\), with \(R^2\) = 0.96. As the model fit was similar to the model without eliminating the two outlier data points, we proceeded with the full data set keeping the two potential outlier data points.
The number of simulated and observed translation equivalents plotted against each other. The dots represent the value of a child tested on the CDI, with their observed number of TEs and the expected number of TEs based on our model. The diagonal dashed line represents the case where the bias parameter equals 1 (BIAS = 1) such that the predicted and observed number of TEs are equal, and the solid blue line represents the model predictions.
Despite the good overall fit to the data, a close examination of Figure 6 suggested that the model might less closely fit the data of children with smaller vocabulary sizes. Figure 7 displays the model fit separately for children with a word vocabulary (WORD) less than 300 words and those with a word vocabulary (WORD) of 300 or greater. Based on visual inspection, the slope of translation equivalent learning appeared steeper for children with less than 300 total vocabulary, suggesting that translation equivalents are more easily learned (i.e., BIAS > 1); whereas the slope of translation equivalent learning appeared to align with the Neutral Account of the Bilingual Vocabulary Model (i.e., BIAS = 1) for children with more than 300 total vocabulary. To further explore this pattern, we ran the same linear regression twice, separately for children with less than 300 total vocabulary and for those with more than 300 total vocabulary. The model for those with larger total word vocabulary (WORD) returned a coefficient of BIAS = 1.02, \(p < .001\), whereas the model for those with less than 300 total word vocabulary (WORD) returned a coefficient of BIAS = 2.22, \(p < .001\). Both models fit well, although a somewhat better fit was obtained for children with larger vocabulary size (\(R^2\) = 0.97) than children with smaller vocabulary size (\(R^2\) = 0.88). Overall, this analysis suggests that translation equivalent learning for children with larger vocabularies corresponds best to the Neutral Account, but translation equivalent learning for children with smaller vocabularies corresponds best to the Preference Account.
The number of observed translation equivalents as a function of number of expected translation equivalents under the Bilingual Vocabulary Model other (represented by the blue solid line), plotted separately for children with fewer than 300 word vocabulary (left panel) and for those with more than 300 word vocabulary (right panel). The dashed diagonal line represents the case where the parameter equals 1 (BIAS = 1) such that the predicted and observed number of TEs are equal.
The aim of the current study was to better understand translation equivalent learning in bilingual children, specifically investigating whether translation equivalents are harder (Avoidance Account), easier (Preference Account), or similar (Neutral Account) for bilingual children to learn than singlet words (i.e., the first label for a particular referent). To test these accounts, we developed the Bilingual Vocabulary Model, which quantifies the number of translation equivalents that children produce as a product of words they know in their dominant and non-dominant language, divided by the number of words that are learnable at their developmental level. The inclusion of a learnability parameter was a unique aspect of our approach, and was crucial to quantifying how many translation equivalents versus singlets were available to be learned given the child’s age. The relative difficulty of learning translation equivalents relative to singlets was modeled via the bias parameter (BIAS), which indicated whether translation equivalent learning is consistent with the Avoidance (BIAS < 1), Preference (BIAS > 1), or Neutral Account (BIAS = 1).
In Study 1, we simulated vocabulary and translation equivalent knowledge based on the Bilingual Vocabulary Model, and in Study 2 we tested three sets of model-generated predictions using archival CDI data from 200 bilingual children aged 18-33 months. Three sets of model predictions were confirmed in our empirical dataset.
Prediction Set 1 pertained to relationships between translation equivalent knowledge, vocabulary balance, and vocabulary size in the dominant and non-dominant languages. In both the simulated and observed data, children with more balanced vocabularies (i.e., those who produced a similar number of words in each of their languages) produced more translation equivalents. This pattern is consistent with reports from previous research (Davidson & Wei, 2008; Legacy et al., 2016; Montanari, 2010; Pearson et al., 1995; 1997). Moreover, both the simulated and observed data showed that the children who produced more total words produced more translation equivalents, which is in line with previous research showing that the number of translation equivalents a bilingual child knows increases along with their total vocabulary size (Legacy et al., 2016; Montanari, 2010). Additionally, both our simulated and observed data showed that the more words children knew in their dominant language, the more translation equivalents they produced. This pattern is consistent with previous research reporting a positive correlation between bilingual children’s size of dominant language vocabulary and the proportion of translation equivalents (Poulin-Dubois et al., 2013; Legacy et al., 2015). Finally, both our simulated and observed data showed that the more words children produced in their non-dominant language, the more translation equivalents they produced. A similar pattern has been reported by Legacy and colleagues (2015), where vocabulary size in the non-dominant language positively correlated with the proportion of translation equivalents known by the child (Legacy et al., 2015).
Prediction Set 2 pertained to the relationship between the number of potentially learnable words for a child (constrained by their developmental level) and the production of translation equivalents and singlets (i.e., words without a translation equivalent). We operationalized developmental level in terms of children’s age, and set the number of learnable words at the number produced by children at the 90th percentile for that age (averaged across French and English). Both simulated and observed data showed older children had larger concept vocabularies, a pattern consistent with reports from previous literature (Pearson et al., 1993). Likewise, Prediction 2b was confirmed by the observed data as older children produced more translation equivalents than younger children. This pattern is consistent with the literature that bilingual children learn more translation equivalents as they grow older (David & Wei, 2008; Legacy et al., 2016). Predictions 2c and 2d were also confirmed by our vocabulary data. While children produced more singlets in both the dominant and non-dominant languages with age, the least balanced children produced the most singlets in their dominant language and the most balanced children produced the most singlets in their non-dominant language. These patterns are also in line with the notion that bilingual children learn words in proportion to their relative exposure to each language (e.g., Boyce et al., 2013; Hoff et al., 2012; Marchman et al., 2010; Place & Hoff, 2011; Pearson et al., 1997). Therefore, within the number of words that are potentially learnable at a particular developmental level, bilingual children with less balanced language exposure have more opportunities to learn more words in their dominant language than their non-dominant language, whereas bilingual children with more balanced language exposure have more equal opportunities to learn words in each of their language.
Overall, we observed a strong correspondence between the data simulated under the Bilingual Vocabulary Model and our observed data. Moreover, our model predicted numerous disparate patterns that have been previously reported in the literature.
Having validated our overall approach, Prediction 3 motivated using the Bilingual Vocabulary Model to quantitatively test three conceptual accounts of translation equivalent learning: the Avoidance Account, the Preference Account, and the Neutral Account. The number of translation equivalents children produced was a very close fit to the Neutral Account (i.e., translation equivalents learning are similar to learn than singlets), with this model explaining 96% of variance in the data. However, there was some indication that the Neutral Account provided a poorer fit for children with smaller vocabulary sizes. Modeling their data separately, we found evidence for the Preference account: younger children at around 22 months appeared to learn translation equivalents more easily than singlets, whereas older children at around 28 months learned translation equivalents similarly to singlets. This could indicate a qualitative shift in word learning that occurs as bilingual children develop and learn more words, from the Preference Account to the Neutral Account. This pattern of a qualitative shift contradicts previous evidence proposing that bilingual children between the ages of 6 months and 7 years learn translation equivalents more easily than singlets (Bilson et al., 2015). The discrepancy could potentially be explained by the difference in how expected patterns of translation equivalent learning were simulated in each study. Previous approaches simulated bilingual language learning using data from randomly-paired monolinguals or lexicons of two different bilinguals as a reference point for the Neutral Account (e.g., Bilson et al., 2015; Pearson et al., 1995). The Bilingual Vocabulary Model represents a significant theoretical and methodological advance, as it does not make reference to randomly-paired children, and instead uses children’s own dominant and non-dominant vocabulary size, together with their developmental level, to gauge how many translation equivalents they are expected to learn.
The developmental change of bilingual children’s ability to learn translation equivalents could be related to changes in children’s use of one-to-one mapping biases such as mutual exclusivity. As revealed by previous studies, younger children and children with smaller vocabulary size and thus less vocabulary knowledge seem to not have a strong bias for a one-to-one mapping between words and referents (Halberda, 2003; Lewis et al., 2020; Merriman & Bowman, 1989). In other words, children with less experience in word learning may be more inclined to accept multiple words for the same referent (Halberda, 2003; Merriman & Bowman, 1989). In contrast, children with larger vocabulary size appear to become more certain about the one-to-one mapping relationships between referents and words (Lewis et al., 2020), while at the same time they also take better advantage of their bilingual exposure to accept that referents can have different words between languages (Au & Glusman, 1990, David & Tell, 2005). At first blush, strengthening of one-to-one mapping biases over age could explain why younger children appear to learn relatively more translation equivalents than older children. Yet, this explanation would not predict that younger bilinguals’ data would follow the Preference Account as we observed, and might instead predict development from the Neutral to the Avoidance account, before perhaps returning to the Neutral account once children realize that each referent should have a label in each language. Thus, changes in one-to-one mapping biases do not provide a complete explanation for our results.
Another possible explanation is that the nature of bilingual input changes as children become more advanced word learners. Some recent research has suggested that bilingual parents sometimes code-switch to use a word that they know to be in their child’s vocabulary (Kremin et al., 2021; Nicoladis & Secco, 2000). For example, a caregiver may choose to say to their English–French bilingual child “Can you grab the livre?” if they know their child understands the French word “livre” but not the English equivalent “book.” This may provide fewer opportunities for children to learn translation equivalents, since they would be less exposed to the unfamiliar translation equivalents. However, this observation would predict that young bilinguals would know fewer translation equivalents as a proportion of their vocabularies than older bilinguals, which was opposite to what we observed. Thus, changes in bilingual input also do not provide an adequate explanation for our results of a qualitative change in translation equivalent learning. Overall, more research will be needed to understand why translation equivalents appear to be over-represented in younger bilinguals’ vocabularies.
Our Bilingual Vocabulary Model presented an integrated computational account of translation equivalent learning, focusing on the joint probability of learning the word for a concept in each language. To do so, our model parameters included the number of words produced in each language, as well as children’s developmental level. However, our model does not consider other qualitative factors including family socioeconomic status (e.g., Hoff, 2003; Fernald, Marchman, & Weisleder, 2013), parents’ interaction with their children (e.g., Blewitt et al., 2009; Yu & Smith, 2012), and the quality of parental language input over time (e.g., Raneri et al., 2020, Rowe, 2012). It would be interesting for future studies to take into consideration the qualitative factors in a bilingual word learning model, including different amounts of input and the quality of that input. Such a model may better characterize and predict bilingual vocabulary development as a function of experience. Moreover, it would be important to extend our Bilingual Vocabulary Model to longitudinal data or data of a different bilingual population to investigate if it is possible to replicate the qualitative shift where bilingual children’s ability to learn translation equivalents appears to change across development.
Another limitation of our model is that it takes a somewhat simplified view of translation equivalents, assuming that children encounter the same conceptual categories in each of their languages and are exposed to the corresponding words. However, the reality of bilingual experience might be more complex. First, some concepts expressed as a single word in one language may be lexicalized by two words in another language (e.g., English has a single word for “sister” but Mandarin has separate words for “jiějie” [older sister] and “mèimei” [younger sister]). As another example, some words may not have a translation equivalent in the other language (e.g., the Japanese word “sushi” is borrowed into other languages). Still other languages categorize objects differently within conceptual categories (e.g., a shallow dish might be called a “bowl” in English but an “assiette” [plate] in French). There is mixed evidence for whether bilingual adults maintain separate (Jared et al., 2012) versus integrated (Ameel et al., 2009) conceptual representations across their two languages, and little to no data from bilingual children. Second, our model did not take into account that bilingual children appear to learn similar-sounding translation equivalents (i.e., cognates like the English–French pair “banana” – “banane”) more easily than those that do not share similar phonological form (e.g., the English–French pair “dog” – “chien”) (Bosch & Ramon-Casas, 2014). Likewise, some bilingual children learn language pairs that share more cognates than others (e.g., Spanish and Italian share more phonologically similar translation equivalents than English and French; Schepens et al., 2013). While more research will be needed on how these factors impact bilingual vocabulary learning, the close correspondence between our model and data from bilingual children suggest that even if our assumptions are a simplification, deviations from these assumptions might have a relatively small impact. Moreover, if they do prove to be important, such factors could be added to future iterations of the Bilingual Vocabulary Model.
Another assumption of our model was that bilingual children hear labels from both languages for the same set of referents. However, following the Complementarity Principle (Grosjean, 2016), bilinguals may have different experiences in each of their languages. For example, a French–English bilingual child who always spends bathtime with an English-speaking parent might encounter bath words primarily in English (e.g., “soap,” “bath,” “bubbles”), therefore having less opportunity to acquire their translation equivalents in French. At the same time, cross-linguistic data has provided evidence of a high degree of commonality in the first words children produced (e.g., Braginsky et al., 2016; Tardif et al., 2008). For example, words for important people (“mommy,” “daddy”), social routines (“hi,” “bye,” “yes,” “no”), and simple nouns (“ball,” “dog”) are among the first words children across languages and cultures. It therefore seems reasonable to expect that bilingual children would be exposed to a similar set of referents and labels in each of their languages. Moreover, if indeed bilingual children tend to encounter different words in different linguistic contexts, we would have expected our data to be consistent with the Avoidance account (e.g., fewer than expected translation equivalents), which is not what we observed. Nonetheless, future studies of bilingual corpora could directly address whether early translation equivalent learning might be impacted by the Complementarity Principle.
Finally, we must note the reciprocal relationship in the Bilingual Vocabulary Model between the bias parameter (BIAS) and the parameter that accounts for how many words are potentially learnable at a particular age (LEARNABLE). Under the Bilingual Vocabulary Model, the learnability parameter and the bias parameter jointly predict the number of translation equivalents that a child will learn based on the number of words that they know in each of their languages. That is, if the assumed learnability parameter changes by a factor of two (e.g., whereby only 122 words in each language are learnable for 18-month-olds, rather than 244), then estimates of the bias parameter will also change by a factor of two (i.e., rather than a parameter of 2.22 which supports the Preference account, we would estimate a parameter of 1.11 which is closer to the Neutral Account). Our model estimated the number of learnable words to be the number that children at the 90th percentile at a particular age produce. Small changes to this approach (e.g., taking the number of words children at the 95th percentile produce) would likely not drastically alter our results, nor change the qualitative shift that we observed in our data. Nonetheless, future research will be needed to more precisely quantify the number of words that are learnable by particular children at particular ages.
In sum, the acquisition of translation equivalents has been considered a special component in bilingual children’ vocabulary development. Previous research has put forward three diverging accounts of translation equivalent learning: the Avoidance Account, the Preference Account, and the Neutral Account. We proposed the Bilingual Vocabulary Model, which provides a quantitative way to test these accounts, by modeling translation equivalent learning in relation to vocabulary size in each language and the number of potentially learnable words, which is constrained by children’s developmental level. Results using archival data from a large number of young French–English bilingual children showed that our model was a good fit to the Neutral Account, although younger children may show a preference for translation equivalent learning in line with the Preference Account. Moreover, our model parsimoniously explained previously disparate observations about bilingual children’s translation equivalent learning, for example that the number of translation equivalents children produce is tightly linked to their vocabulary size in their non-dominant language, and thus all else equal children with more balanced vocabularies will produce more translation equivalents. Future studies with data from other populations of bilinguals will be important to more fully test the Bilingual Vocabulary Model.